TIME SERIES ANALYSIS - SALES OF SHAMPOO

***Time Series Chart using Plotly library

***Not all days are evenly represented in the dataframe and we will like to forcast the sales of the shampoo with respect to the consequent months there is a need for resampling.

**The central tendency measure MEAN is employed to generate the missing days in the three months. Firstly, represent the missing days with NaN.

We have been able to generate the missing values using MEAN, then we need to intepolate the missing values since the methods below:

[A] Up-sampling frequency method {ie, generating daily data from monthly data}.

We can interpolate the missing values at this new frequency. The function, interpolate() of pandas library is used to interpolate the missing values. We use a linear interpolation which draws a straight line between available data, on the first day of the month(2021-01-01) and fills in values at the chosen frequency from this line.

***Types of Interpolation: Linear interpolation; polynominal/spline interpolation; quadratic interpolation; nearest interpolation; slinear interpolation; zero interpolation; cubic interpolation.

***Interpolation using Spine or polynomial type - interpolation fits a cubic polynomial to the points around the missing values. This is a painfully slow method that usually gives best results.

***Spine interpolation: This creates more curves and look more natural on many datasets.Using a spline interpolation requires you specify the order (count of terms in the polynomial); we use 2.

NOTE***Quadratic and spline interpolation have slight differences in terms of the cummulative data.

***Conclusion: We have been able to forecast the sales of shampoo for 2022 and 2023. The data displayed on the charts are monthly data from January to December for 2021, 2022 and 2023.

[B] ***Two Ways of Down-sampling Frequency Down-sampling Frequency {ie, generating quartely data from monthly data}.

The Shampoo sales data presented monthly, but we prefer the data to be quarterly. The year can be divided into 4 business quarters, 3 months a piece. Then, the resample() function will group all observations by the new frequency. We need to decide how to create a new quarterly value from each group of 3 records. We shall use the mean() function to calculate the average monthly sales numbers (shampoo) for the quarter.

***Down-sampling Frequency {ie, generating yearly data from monthly data}.

We can turn monthly data into yearly data. Down-sample the data using the alias, 'A' for year-end frequency and this time use sum to calculate the total sales each year.

***Conclusion: From the interpolation the missing days were generated and aid in forecasting the remaining days, months and year{2022,2023}.